尽管变形金刚及其变体构象体在语音识别方面表现出了有希望的表现,但参数化的属性在训练和推理过程中导致了很大的记忆成本。一些作品使用跨层重量分享来减少模型的参数。但是,不可避免的能力损失会损害模型性能。为了解决这个问题,本文提出了通过共享稀疏门控专家的参数效率构象异构体。具体而言,我们使用稀疏门控的专家(MOE)来扩展构型块的容量而不增加计算。然后,共享分组构象块的参数,以减少参数的数量。接下来,为了确保具有不同级别适应表示的灵活性的共享块,我们会单独设计MOE路由器和标准化。此外,我们使用知识蒸馏来进一步提高性能。实验结果表明,与全参数模型相比,所提出的模型用编码器的1/3来实现竞争性能。
translated by 谷歌翻译
图形结构化数据通常在自然界中具有动态字符,例如,在许多现实世界中,链接和节点的添加。近年来见证了对这种图形数据进行建模的动态图神经网络所支付的越来越多的注意力,几乎所有现有方法都假设,当建立新的链接时,应通过学习时间动态来传播邻居节点的嵌入。新的信息。但是,这种方法遭受了这样的限制,如果新连接引入的节点包含嘈杂的信息,那么将其知识传播到其他节点是不可靠的,甚至导致模型崩溃。在本文中,我们提出了Adanet:通过增强动态图神经网络的强化知识适应框架。与以前的方法相反,一旦添加了新链接,就立即更新邻居节点的嵌入方式,Adanet试图自适应地确定由于涉及的新链接而应更新哪些节点。考虑到是否更新一个邻居节点的嵌入的决定将对其他邻居节点产生很大的影响,因此,我们将节点更新的选择作为序列决策问题,并通过强化学习解决此问题。通过这种方式,我们可以将知识自适应地传播到其他节点,以学习健壮的节点嵌入表示。据我们所知,我们的方法构成了通过强化学习的动态图神经网络来探索强大知识适应的首次尝试。在三个基准数据集上进行的广泛实验表明,Adanet可以实现最新的性能。此外,我们通过在数据集中添加不同程度的噪声来执行实验,并定量和定性地说明ADANET的鲁棒性。
translated by 谷歌翻译
知识蒸馏(KD)证明了其有效性,可以提高图形神经网络(GNN)的性能,其目标是将知识从更深的教师gnn蒸馏成较浅的学生GNN。但是,由于众所周知的过度参数和过度光滑的问题,实际上很难培训令人满意的教师GNN,从而导致实际应用中的知识转移无效。在本文中,我们通过对GNN的加强学习(称为FreeKD)提出了第一个自由方向知识蒸馏框架,而这不再需要提供更深入的良好优化的教师GNN。我们工作的核心思想是协作建立两个较浅的GNN,以通过以层次结构方式通过加强学习来交流知识。正如我们观察到的一个典型的GNN模型在训练过程中通常在不同节点的表现更好,更差的表现,我们设计了一种动态和自由方向的知识转移策略,该策略由两个级别的动作组成:1)节点级别的动作决定了知识的方向。两个网络的相应节点之间的传输;然后2)结构级的动作确定了要传播的节点级别生成的局部结构。从本质上讲,我们的FreeKD是一个一般且原则性的框架,可以自然与不同架构的GNN兼容。在五个基准数据集上进行的广泛实验表明,我们的FreeKD在很大的边距上优于两个基本GNN,并显示了其对各种GNN的功效。更令人惊讶的是,我们的FreeKD比传统的KD算法具有可比性甚至更好的性能,这些KD算法将知识从更深,更强大的教师GNN中提取。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译
Rankings are widely collected in various real-life scenarios, leading to the leakage of personal information such as users' preferences on videos or news. To protect rankings, existing works mainly develop privacy protection on a single ranking within a set of ranking or pairwise comparisons of a ranking under the $\epsilon$-differential privacy. This paper proposes a novel notion called $\epsilon$-ranking differential privacy for protecting ranks. We establish the connection between the Mallows model (Mallows, 1957) and the proposed $\epsilon$-ranking differential privacy. This allows us to develop a multistage ranking algorithm to generate synthetic rankings while satisfying the developed $\epsilon$-ranking differential privacy. Theoretical results regarding the utility of synthetic rankings in the downstream tasks, including the inference attack and the personalized ranking tasks, are established. For the inference attack, we quantify how $\epsilon$ affects the estimation of the true ranking based on synthetic rankings. For the personalized ranking task, we consider varying privacy preferences among users and quantify how their privacy preferences affect the consistency in estimating the optimal ranking function. Extensive numerical experiments are carried out to verify the theoretical results and demonstrate the effectiveness of the proposed synthetic ranking algorithm.
translated by 谷歌翻译